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Abstract 

Given a Boolean function /, the quantity ess(f) denotes the largest set of 
assignments that falsify /, no two of which falsify a common implicate of 
/. Although ess(f) is clearly a lower bound on cnf _size(f) (the minimum 
number of clauses in a CNF formula for /), Cepek et al. showed it is not, 
in general, a tight lower bound [lj]. They gave examples of functions / for 
which there is a small gap between ess(f) and cnfsize(f). We demonstrate 
significantly larger gaps. We show that the gap can be exponential in n 
for arbitrary Boolean functions, and 0(-y/n) for Horn functions, where n is 
the number of variables of /. We also introduce a natural extension of the 
quantity ess(f), which we call essk(f), which is the largest set of assignments, 
no k of which falsify a common implicate of /. 
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1. Introduction 

Determining the smallest CNF formula for a given Boolean function / 
is a difficult problem that has been studied for many years. (See jij for an 
overview of relevant literature.) Recently, Cepek et al. introduced a com- 
binatorial quantity, ess(f), which lower bounds cnfsize(f), the minimum 
number of clauses in a CNF formula representing / [lj]. The quantity ess(f) 
is equal to the size of the largest set of falsepoints of /, no two of which 
falsify the same implicate of /. □ 



Email addresses: hstein@poly.edu (Lisa Hellerstein), dkletenik@cis.poly.edu 
(Devorah Kletenik) 

1 This definition immediately follows from Corollary 3.2 of Cepek et al. [lj]. 
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For certain subclasses of Boolean functions, such as the monotone (i.e., 
positive) functions, ess(f) is equal to cnf_size(f). However, Cepek et al. 
demonstrated that there can be a gap between ess(f) and cnf _size(f). They 
constructed a Boolean function / on n variables such that there is a multi- 
plicative gap of size O(logn) between cnfsize(f) and ess(/)H Their con- 
structed function / is a Horn function. Their results leave open the possibility 
that ess(f) could be a close approximation to cnf _size(f). 

We show that this is not the case. We construct a Boolean function / 
on n variables such that there is a multiplicative gap of size 2 e ^ between 
cnf size(f) and ess(f). Note that such a gap could not be larger than 2 n ~ 1 , 
since cnf _size(f) < 2 n ~ 1 for all functions /. 

We also construct a Horn function / such that there is a multiplicative 
gap of size 9(y / n) between cnf _size(f) and ess(f). We show that no gap 
larger than 0(n) is possible. 

If one expresses the gaps as a function of cnf size(f) , rather than as a 
function of the number of variables n, then the gap we obtain with both the 
constructed non-Horn and Horn functions / is cnf _size(f) 1 ^ 3 . Clearly, no 
gap larger than cnf _size(f) is possible. 

We briefly explore a natural generalization of the quantity ess(f), which 
we call essk{f), which is the largest set of falsepoints, no k of which falsify 
a common implicate of /. The quantity ess(f)/(k — 1) is a lower bound on 
CNF-size, for any k > 2. 

The above results concern the size of CNF formulas. Analogous results 
hold for DNF formulas by duality. 

2. Preliminaries 

2.1. Definitions 

A Boolean function f(xi, . . . , x n ) is a mapping {0, l} n — > {0, 1}. (Where 
it does not cause confusion, we often use the word "function" to refer to a 
Boolean function.) A variable Xi and its negation l X{ £1X6 literals (positive 
and negative respectively). A clause is a disjunction (V) of literals. A term 
is a conjunction (A) of literals. A CNF (conjunctive normal form) formula 
is a formula of the form cq A c\ A . . . where each c« is a clause. A DNF 



2 Their function is actually denned in terms of two parameters rii and n^- Setting them 
to maximize the multiplicative gap between ess(f) and cnf size(f) , as a function of the 
number of variables n, yields a gap of size 0(logn). 



2 



(disjunctive normal form) formula is a formula of the form i V t\ V . . . t)., 
where each ti is a term. 

A clause c containing variables from X n = {x\, . . . ,x n } is an implicate 
of / if for all x G {0, l} n , if c is falsified by x then f(x) = 0. A term t 
containing variables from X n is an implicant of function f(x\, . . . , x n ) if for 
all x G {0, l} n , if t is satisfied by x then f(x) = 1. 

We define the sue of a CNF formula to be the number of its clauses, and 
the size of a DNF formula to be the number of its terms. 

Given a Boolean function /, cnf size(f) is the size of the smallest CNF 
formula representing /. Analogously, dnf _size(f) is the size of the smallest 
DNF formula representing /. 

An assignment x G {0, l} n is a falsepoint of / if f(x) = 0, and is a 
truepoint of / if f(x) — 1. We say that a clause c covers a falsepoint x oi f 
if x falsifies c. A term £ covers a truepoint a; of / if x satisfies t. 

A CNF formula representing a function / forms a cover of the falsepoints 
of /, in that each falsepoint of / must be covered by at least one clause of 
0. Further, if X IS du truepoint of /, then no clause of covers x. Similarly, 
a DNF formula representing a function / forms a cover of the truepoints 
of /, in that each truepoint of / must be covered by at least one term of 0. 
Further, if x is a falsepoint of /, then no term of covers x. 

Given two assignments x, y G {0, 1}", we write x < y if Vi,x, < y{. An 
assignment r separates two assignments p and g if Vz, Pj = r, or % = ty 

A partial function / maps {0, 1}™ to {0, 1, *}, where * indicates that the 
value of / is not defined on the assignment. A Boolean formula is consistent 
with a partial function / if 0(a) = f(a) for all a G {0, l} n where f(a) ^ *. If 
/ is a partial Boolean function, then cnf size(f) and dnf size(f) are the size 
of the smallest CNF and DNF formulas consistent with the /, respectively. 

A Boolean function f(xi, . . . ,x n ) is monotone if for all x, y G {0,1}™, 
if x < y then f{x) < f(y). A Boolean function is anti-monotone if for all 
x,y G {0, 1}™, if x > y then f(x) < f(y). 

A DNF or CNF formula is monotone if it contains no negations; it is 
anti-monotone if all variables in it are negated. A CNF formula is a Horn- 
CNF if each clause contains at most one variable without a negation. If each 
clause contains exactly one variable without a negation it is a pure Horn- 
CNF. A Horn function is a Boolean function that can be represented by a 
Horn-CNF. It is a pure Horn function if it can be represented by a pure 
Horn-CNF. Horn functions are a generalization of anti-monotone functions, 
and have applications in artficial intelligence jsf] . 
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We say that two falsepoints, x and y, of a function / are independent if 
no implicate of / covers both x and y. Similarly, we say that two truepoints 
x and y of a function / are independent if no implicant of / covers both x 
and y. We say that a set S of falsepoints (truepoints) of / is independent if 
all pairs of falsepoints (truepoints) in S are independent. 

The set covering problem is as follows: Given a ground set A = {ei, . . . , e m } 
of elements, a set 5 = {Si, . . . , S^} of subsets of A, and a positive integer k, 
does there exist S' C. S such that Us eS' = ^ an< ^ l^'l — ^? Each set Si e S 
is said to cover the elements it contains. Thus the set covering problem asks 
whether A has a "cover" of size at most k. 

A set covering instance is r-uniform, for some r > 0, if all subsets Si G S 
have size r. 

Given an instance of the set covering problem, we say that a subset A' 
of ground set A is independent if no two elements of A' are contained in a 
common subset Si of S. 

3. The quantity ess(f) 

We begin by restating the definition of ess(f) in terms of independent 
falsepoints. We also introduce an analogous quantity for truepoints. (The 
notation ess d refers to the fact that this is a dual definition.) 

Definition 1. Let / be a Boolean function. The quantity ess(f) denotes the 
size of the largest independent set of falsepoints of /. The quantity ess d (f) 
denotes the largest independent set of truepoints of /. 

As was stated above, Cepek et al. introduced the quantity ess(f) as 
a lower bound on cnfsize(f). The fact that ess(f) < cnfsize(f) follows 
easily from the above definitions, and from the following facts: (1) if is a 
CNF formula representing /, then every falsepoint of / must be covered by 
some clause of (f>, and (2) each clause of </> must be an implicate of /. 

Let /' denote the function that is the complement of /, i.e. f'(a) = ~^f{a) 
for all assignments a. Since, by duality, ess(f') = ess d (f) and cnf size(f') = 
dnf size(f) , it follows that ess(f') < dnf size(f) . 

Property 1. [l[ Two falsepoints of /, x and y, are independent iff there 
exists a truepoint a of / that separates x and y. 
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Consider the following decision problem, which we will call ESS: "Given a 
CNF formula representing a Boolean function /, and a number k, is ess(f) < 
fc?" Using Property 1, this problem is easily shown to be in co-NP 

We can combine the fact that ESS is in co-NP with results on the hardness 
of approximating CNF- minimization, to get the following preliminary result, 
based on a complexity-theoretic assumption. 

Proposition 1. // co-NP ^ Ef, then for some 7 > ; there exists an infinite 
set of Boolean functions f such that ess(/)n 7 < cnf_size{f), where n is the 
number of variables of f . 

Proof. Consider the Min-CNF problem (decision version): Given a CNF 
formula representing a Boolean function /, and a number k, is cnf _size(f) < 
kl Umans proved that it is E^-complete to approximate this problem to 
within a factor of n 7 , for some 7 > 0, where n is the number of variables of 
/ [I| . (Approximating this problem to within some factor q means answering 
"yes" whenever cnfsize(f) < k, and answering "no" whenever cnfsize(f) > 
kq. If k < cnf _size{f) < kq, either answer is acceptable.) 

Suppose ess{f)n 1 > cnf _size(f) for all Boolean functions /. Then one 
can approximate Min-CNF to within a factor of n 1 in co-NP by simply us- 
ing the co-NP algorithm for ESS to determine whether ess(f) < k. Even 
if ess{f)n 1 > cnf size(f) for a finite set S of functions, one can still ap- 
proximate Min-CNF to within a factor of n 7 in co-NP, by simply handling 
the finite number of functions in 5* explicitly as special cases. Since approx- 
imating Min-CNF to within this factor is -complete, E^ C co-NP. By 
definition, co-NP C Ef , so Ef = co-NP. " " □ 

The non-approximability result of Umans for Min-CNF, used in the above 
proof, is expressed in terms of the number of variables n of the function. 
Umans also showed [5| that it is E^ 3 complete to approximate Min-CNF to 
within a factor of m 7 , for some 7 > 0, where m = cnf _size(f) . Thus we can 
also prove that, if NP ^ E^", then for some 7 > 0, there is an infinite set of 
functions / such that ess(f) < cnf _size{f) 1 ^ 1 '. 

The assumption that 7^ co-NP is not unreasonable, so we have grounds 
to believe that there is an infinite set of functions for which the gap between 
ess(f) and cnf size(f) is greater than n 1 (or cnf _size{f) 1 ) for some 7. Below, 
we will explicitly construct such sets with larger gaps than that of Proposition 
1, and with no complexity theoretic assumptions. 
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We can also prove a proposition similar to Proposition 1 for Horn func- 
tions, using a different complexity theoretic assumption. (Since the statement 
of the proposition includes a complexity class parameterized by the standard 
input-size parameter n, we use N instead of n to denote the number of inputs 
to a Boolean function.) 

Proposition 2. If NP £ co-NTIME(n polyl ° 9 ^), then for some e such that 
< e < 1. there exists an infinite set of Horn functions f such that c "/- s ^y) > 

ess{j) 

2 log €N , where N is the number of input variables of f . 

Proof. Consider the following Min-Horn-CNF problem (decision version): 
Given a Horn-CNF <fi representing a Horn function /, and an integer k > 
0, is cnf size(f) < k? Bhattacharya et al. [6?] showed that there exists 
a deterministic, many-one reduction (i.e. a Karp reduction), running in 
time 0(n polyl ° 9 ^) (where n is the size of the input), from an NP-complete 
problem to the problem of approximating Min-Horn-CNF to within a factor 
of 2 log ' N , where N is the number of input variables of / . 

Suppose that ^^rjy^ is at most 2 logl ' N for all Boolean functions /. It is 
well known that given a Horn-CNF /, the size of the smallest (functionally) 
equivalent Horn-CNF is precisely cnfsize(f). Thus given a Horn-CNF </> on 
iV variables, and a number k, if there does not exist a Horn-CNF equivalent 
to (ft of size less than 2 log ' N x k, this can be verified non-deterministically 
in polynomial time (by verifying that ess(f) > k). Thus the complement of 
Min-Horn-CNF is approximable to within a factor of 2 log eN , in determin- 
istic time n polyl ° 9 ^ (where n is the size in bits of the input Horn-CNF, and 
TV is the number of variables in the input Horn-CNF). Combining this fact 
with the reduction of Bhattacharya et al. implies that the complement of 
an NP-complete problem can be solved in non-deterministic time n polyl ° 9<yn \ 
Thus NP is contained in co-NTIME(n po ^° s ( n )). The same holds if mf ^^ f) 

is at most 2 logl for all but a finite set of Boolean functions /. □ 

4. Constructions of functions with large gaps between ess(f) and 
cnf_size(f) 

We will begin by constructing a function /, such that ^^Tj) = ®(n). 
This is already a larger gap than the multiplicative gap of log(ra) achieved 
by the construction of Cepek et al. [l|, and the gap of n> in Proposition 1. 
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We describe the construction of /, prove bounds on cnf _size(f) and ess(f), 
and then prove that the ratio gSfzfHgLQ = 0(n). 



We will then show how to modify this construction to give a function / 
such that £SL=£^LQ — 2 0(n ), thus increasing the gap to be exponential in n. 
At the end of this section, we will explore essk(f), our generalization of 



Proof. We construct a function / such that "^(j) = @(^)- Theorem 14.11 
then follows immediately by duality. 

Our construction relies heavily on a reduction of Gimpel from the 1960's [7J, 
which reduces a generic instance of the set covering problem to a DNF- 
minimization problem. (See Czort [8j or Allender et al. [9] for more recent 
discussions of this reduction.) 

Gimpel's reduction is as follows. Let A = {ei, . . . , e m } be the ground set 
of the set covering instance, and let S be the set of subsets A from which the 
cover must be formed. With each element e, in A, associate a Boolean input 
variable Xi. For each S G S, let xs denote the assignment in {0, l} m where 
Xi = iff ei G S. Define the partial function f(xi,..., x m ) as follows: 



There is a DNF formula of size at most k that is consistent with this 
partial function if and only if the elements of the set covering instance A 
can be covered using at most k subsets in S (cf. [8(). 

We apply this reduction to the simple, 2-uniform, set covering instance 
over m elements where S consists of all subsets containing exactly two of 
those m elements. The smallest set cover for this instance is clearly [to/2]. 
The largest independent set of elements is only of size 1, since every pair of 
elements is contained in a common subset of S. Note that this gives a ratio 
of minimal set cover to largest independent set of 0(m). 

Applying Gimpel's reduction to this simple set covering instance, we get 
the following partial function /: 




ess(f). 




cnfsize(f) 
ess(f) 




if x contains exactly m — 1 ones 
if x > Xs for some S G S 
otherwise 
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1 if a; contains exactly m — 1 ones 

* if rc contains exactly m — 2 ones 

* if x contains exactly m ones 
otherwise 



Since the smallest set cover for the instance has size \m/2], 

dnfsize(f) = \m/2~\. 

Allender et al. extended the reduction of Gimpel by converting the partial 
function / to a total function g. The conversion is as follows: 

Let t — m + 1 and let s be the number of *'s in f(x). Let y\ and y 2 
be two additional Boolean variables, and let z — z± . . . z t be a vector of t 
more Boolean variables. Let S C {0, 1}* be a collection of s vectors, each 
containing an odd number of l's (since s < 2 m , such a collection exists). Let 
X be the function such that x( x ) = if the parity of x is even and x( x ) = 1 
otherwise. 

The total function g is defined as follows: 

if f(x) = 1 and yi — y 2 = 1 an d z G S 
if f(x) = * and y 1 = y 2 = 1 
if f( x ) = *,yi = x( x ), and y 2 = ~^x{ x ) 
otherwise 

Allender et al. proved that this total function g obeys the following 
property: 

dnf ..size(g) = s(dnf size(f) + 1). 

Let g be the total function obtained by setting / = / in the above defi- 
nition of g. 

We can now compute dnfsize(g). Let n be the number of input vari- 
ables of /. The total function g is defined on n = 2m + 3 variables. Since 

dnfsize(f) = \m/2], we have 

dnf_size{g) = s [\-] + 1 J > s I — + l\ 
where s is the number of assignments x for which f(x) = *. 



g(x,y!,y 2 ,z) 
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We will upper bound ess d (g) by dividing the truepoints of g into two 
disjoint sets and upper-bounding the size of a maximum independent set of 
truepoints in each. (Recall that two truepoints of g are independent if they 
do not satisfy a common implicant of g.) 

Set 1: The set of all truepoints of g whose x component has the property 
fix) = *■ 

Let cti be a maximum independent set of truepoints of g consisting only 
of points in this set. Consider two truepoints p and q in this set that 
have the same x value. It follows that they share the same values for 
yi and y 2 . Let t be the term containing all variables Xi, and exactly 
one of the two i/j variables, such that each X{ appears without negation 
if it set to 1 by p and q, and with negation otherwise, and yj is set 
to 1 by both p and q. Clearly, t is an implicant of g by definiton of 
g, and clearly t covers both p and q. It follows that p and q are not 
independent. 

Because any two truepoints in this set with the same x value are not 
independent, |ai| cannot exceed the number of different x assignments. 
There are s assignments such that f(x) = *, so |oi| < s. 

Set 2: The set of all truepoints of g whose x component has the property 
/(*) = !■ 

Let a2 be a maximum independent set consisting only of points in this 
set. Consider any two truepoints p and q in this set that contain the 
same assignment for z. We can construct a term t of the form wy^z 
such that w contains exactly m — 2 x^s that are set to 1 by both p and 
q, and all ZiS that are set to 1 by p and q appear in z without negation, 
and all other ZiS appear with negation. It is clear that t is an implicant 
of g and that t covers both p and q. Once again, it follows that p and 
q are not independent truepoints of g. 

Because any two truepoints in this set with the same z value are not 
independent, | a.2 1 cannot exceed the number of different z assignments. 
There are s assignments to z such that z G S, so \a 2 \ < s. 

Since a maximum independent set of truepoints of g can be partitioned 
into an independent set of points from the first set, and an independent set 
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of points from the second set, it immediately follows thatH 

ess d (g) < \ai\ + 1 0,2 1 < s + s = 2s. 
Hence, the ratio between the DNF size and ess(g) size is: 

8(^ + 1) . n + 1 



2s 



> —— = Q(n) 



□ 



Note that the above function gives a class of functions satisfying the 
conditions of Proposition 1, for 7 = 1. 

Corollary 1. There exists a function f such that cw /- s ^y) > cnf _size(f) £ 
for an e > 0. 

Proof. In the previous construction, f(x) = * for exactly (™) + 1 points, 
yielding s = 0(n 2 ). Hence, the DNF size is B(m 3 ), making the ratio between 
dnf _size(g) and ess d (g) at least 0(dnf_size(g)z). The CNF result follows by 
duality. □ 

4-2. Constructing a function with an exponential gap 



Theorem 2. There exists a function f on n variables such that cn ^ > 
2 e(, 



Proof. As before, we will reduce a set covering instance to a DNF-minimization 
problem involving a partial Boolean function /. However, here we will rely 
on a more general version of Gimpel's reduction, due to Allender et al., de- 
scribed in the following lemma. 

Lemma 1. [5] Let S = {Si, . . . , S p } be a set of subsets of ground set A = 
{ei, . . . , e m }. Let t > and let V = {v l : i G {1, . . . , m}} and W = {w^ : 
j G {1, ... ,p}} be sets of vectors from {0, 1}* such that for all j G {1, . . . ,p} 
and i G {1, . . . , m}, 

Ci G Sj iff v i > w j 



3 It can actually be proved that in fact, ess d (g) — 2s, but details of this proof are 
omitted. 
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Let f : {0, 1}* — > {0, 1, *} be the partial function such that 

ifxeV 

if x > w for some w G W and x ^ V 
otherwise 

Then S has a minimum cover of size k iff dnf size(f) = k. 

(Note that the construction in the above lemma is equivalent to Gimpel's 
if we take t = m, V = {v G {0, l} m |t> contains exactly m — 1 l's }, and 
W = {xs\S G S}, where xs denotes the assignment in {0, l} m where Xi = 
iff e; G S.) 

As before, we use the simple 2-uniform set covering instance over m el- 
ements where S consists of all subsets of two of those elements. The next 
step is to construct sets V and W satisfying the properties in the above 
lemma for this set covering instance. To do this, we use a randomized con- 
struction of Allender et al. that generates sets V and W from an r-uniform 
set-covering instance, for any r > 0. This randomized construction appears 
in the appendix of [9|, and is described in the following lemma. 

Lemma 2. Let r > and let S = {S\, . . . , S p } be a set of subsets of 
{ei, . . . ,e m }, where each Si contains exactly r elements. Let t > 3r(l + 
ln(pm)). Let V = {v 1 , . . . ,v m } be a set of m vectors of length t, where each 
v l G V is produced by randomly and independently setting each bit of v l to 
with probability 1/r. Let W = {w 1 , . . . ,w p }, where each = the bitwise 
AND of all v l such that Ci G Sj. Then, the following holds with probabil- 
ity greater than 1/2: For all j G {1, . . . ,p} and i G {1, . . . ,m}, e« G Sj iff 
v l > wK 

By LemmaEl there exist sets V and W, each consisting of vectors of length 
6(1 + ln(m 2 (m — 2)/2)) = O(logm), satisfying the conditions of Lemma [T] 
for our simple 2-uniform set covering instance. Let / be the partial function 
on O(logm) variables obtained by using these V and W in the definition of 
/ in Lemma [U 

The DNF-size of / is the size of the smallest set cover, which is [m/2], 
and the number of variables n = 6 (log m); hence the DNF size is 

2 e(n)_ 

We can convert the partial function f(x) to a total function g(x) just as 
done in the previous section. The arguments regarding DNF-size and ess d (g) 
remain the same. Hence, the DNF-size is now s (2 e( - n ^ + l) , and ess d (g) is 
again at most 2s. 
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The ratio between the DNF-size and ess d (g) is therefore at least 2 e(ra ). 
Once again, the CNF result follows. □ 



4-3. The quantity essk(f) 

We say that a set S of falsepoints (truepoints) of / is a "^-independent 
set" if no k of the falsepoints (truepoints) of / can be covered by the same 
implicate (implicate) of /. 

We define essk(f) to be the size of the largest /c-independent set of false- 
points of /, and essf(f) to be the size of the largest k- independent set of 
truepoints of /. 

If S is a independent set of falsepoints of /, then each implicate of / can 
cover at most k — 1 falsepoints in S. We thus have the following lower-bound 
on CNF-size: cnf _size(f) > es ^P ■ 

Like ess(f), this lower bound is not tight. 

Theorem 3. For any arbitrary 2 < k < h(n), where h(n) = Q(n), there 
exists a function f on n variables, such that the gap between cnf _size(f) and 
is at least 2 e ^l 

Proof. Consider the fc-uniform set cover instance consisting of all subsets of 
{ei, . . . , e m } of size k. Construct V and W randomly using the construction 
from the appendix of |j| described in Lemma [2], and define a corresponding 
partial function /, as in Lemma 1. Note that according to the definition of /, 
there can be no k v % for any k values of i e {1, . . . , m}, such that all v 1 > 
for some j G {1, . . . ,p}. The maximum size k- independent set of truepoints 
of / consists of k — 1 truepoints. 

We can convert the partial function / to a total function g according 
to the construction detailed in Section 14. 11 Once again, we introduce s new 
truepoints such that f(x) = *, yielding a maximum of s pairwise independent 
truepoints. The definiton of /c-independence, however, allows k — 1 "copies" 
of these truepoints that differ in the assignments to z for each of the s points. 
Hence, the largest fc-independent set of these points can contain a maximum 
of s(k — 1) points. 

We have previously mentioned that there exist k—1 independent ground 
elements (i.e., f(x) = 1 truepoints). Once again, when we consider the s z 
portion of the term, where no two z portions can be covered by the prime 
implicate, we can include a total of s(k — 1) of these truepoints. Hence, 
the largest independent set for points of this type is of size is of size no 
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greater than s(k — 1). Since these two types of truepoints are independent, 
ess d k (g) < 2s{k - 1). 

The lower bound on DNF size, -fzrp, is, for this g, < 2s ^S^ ^ 2s. The 
ratio between that and the actual DNF size is 

*(2 e( ^ + l) > e(f ) 
2s 

The CNF result clearly follows. □ 



5. Size of the gap for Horn Functions 

Because Horn-CNFs contain at most one unnegated variable per clause, 
they can be expressed as implications; eg. a V ft is equivalent to a — > b. 
Moreover, a conjunction of several clauses that have the same antecedent can 
be represented as a single meta-clause, where the antecedent is the antecedent 
common to all the clauses and the consequent is comprised of a conjunction 
of all the consequents, eg. (a — > b) A(a — > c) can be represented as a — )■ {b Ac). 

5.1. Bounds on the ratio between cnf size(f) and ess(f) 

Angluin, Frazier and Pitt [10] presented an algorithm (henceforth: the 
AFP algorithm) to learn Horn-CNFs, where the output is a series of meta- 



clauses. It can be proven [ill, [12] that the output of the algorithm is of 
minimum implication size (henceforth: min_imp(f)) - that is, it contains the 
fewest number of meta-clauses needed to represent function /. Each meta- 
clause can be a conjunction of at most n clauses; hence, each implication is 
equivalent to the conjunction of at most n clauses. Therefore, 

cnf _size(f) < n x minJmp(f). 

The learning algorithm maintains a list of negative and positive examples 
(falsepoints and truepoints of the Horn function, respectively), containing at 
most minJmp(f) examples of each. 

Lemma 3. The set of negative examples maintained by the AFP algorithm 
is an independent set. 



Proof. The proof for this lemma relies heavily on 11]; see there for further 
details. 

Let us consider any two negative examples, ni and nj, maintained by the 
algorithm. There are two possibilities: 
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1. rii < or rij < rii. (These two examples are comparable points; one is 
below the other on the Boolean lattice.) 

2. rii and rij are incomparable points (Neither is below the other on the 
lattice). 

Let us consider the first type of points: Without loss of generality, assume 
that rii < n-j- Arias et al. define a positive example n* for each negative 
example rii. This example n* has several unique properties; amongst them, 
that rii < n* for all negative examples n» (Section 3 in [ll|). They further 



prove (Lemma 6 in [ll|) that if rii < rij, then n* < rij as well. Hence, any 
attempt to falsify both falsepoints, and rij, with a common implicate of 
the Horn function would falsify the positive example (n*) that lies between 
them as well. Therefore, these two points are independent. 

Now let us assume that rii and rij are incomparable. Any implicate that 
falsifies both points is composed of variables on which the two points agree. 
Clearly, this implicate would likewise cover a point that is the componentwise 



intersection of rii and rij. However, Arias et al. prove (Lemma 7 in [ll|) that 
riiArij is a positive point if rii and rij are incomparable. Hence, any implicate 
that falsifies both rii and rij would likewise falsify the truepoint rii A rij that 
lies between them. Therefore, these two points cannot be falsified by the 
same implicate and they are independent. □ 

Theorem 4. For any Horn function f , cn f- sl * e (f) < n 

Proof. For any Horn function /, there exists a set of negative examples of 
size at most minJmp(f), and these examples are all independent. Hence, 
ess(/) > min_imp(f). We have already stated that min-imp(f) is at most 
a factor of n times larger than the minimum CNF size for this function. 
Hence, cnf _size(f) < n x ess(f). 

Moreover, since Lemma [3] holds for general Horn functions in addition to 



pure Horn [12|, this bound holds for all Horn functions. □ 



5.2. Constructing a Horn function with a large gap between ess(f) and cnf _size(f) 
Theorem 5. There exists a definite Horn function f on n variables such 
that^£^l>Q(^)- 

Proof. Consider the 2-uniform set covering instance over k elements consist- 
ing of all subsets of two elements. We can construct a definite Horn formula 
ip corresponding to this set covering according to the construction in (l3| . 
with modifications based on [61. 
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The formula tp will contain 3 types of variables: 



• Element variables: There is a variable x for each of the k elements. 

• Set variables: There is a variable s for each of the ( 2 ) subsets. 

• Amplification variables: There are t variables z\ ... Zt- 
The clauses in (p fall into the following 3 groups: 

• Witness clauses: There is a clause Sj — >■ Xi for each subset and for each 
element that the subset covers. There are 2(Jj) such clauses. 

• Feedback clauses : There is a clause x\...Xk —> Sj for each subset. 
There are (Jj) such clauses. 

• Amplification clauses: There is a clause Zh — > Sj for every h G {1 ... i} 
and for every subset. There are sucn clauses. 



It follows from [13] that any minimum CNF for this function must contain 
all witness and feedback clauses, along with tc amplification clauses, where 
c is the size of the smallest set cover. 

This particular function / has a minimum set cover of size fc/2; hence, 
cnf_ Si ze(f) = 2Q + Q+t(k/2). 

We will upper bound ess(f) by dividing the falsepoints of / into three 
disjoint sets and finding the maximum independent set for each. 

Set 1: The set of all falsepoints of / that contain at least one X{ = for 
i G {1, . . . , k} and some Sj = 1 for a subset Sj that covers X{. 

Let ai be the largest independent set of / consisting of points in this set. 
These points can be covered by an implicates of the form Sj — > Xi, of 
which there are 2(„). We will define the function /' whose falsepoints 
are just the Type 1 points. Since these points are covered by the 
Sj — > Xi implicates, cnf _size(f) is no more than the number of Sj — > Xi 
implicates. We have earlier said that ess(f') < cnfsize(f'), hence it 
follows that ess(f') < 2(g). ess(f') is precisely the size of a\\ hence, a\ 
can contain no more than 2 (™) points. 
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Set 2: The set of all falsepoints that are not in the first set, have Xj — 1 for 
all i G {1, . . . , k}, and at least one Sj = for some j G {1, . . . , (*) }• 

Let a2 of / be the largest independent set consisting of points in this set. 
These points can be covered by implicates of the form x± . . . x& — > Sj. 
There are Q) such implicates. Hence, by the same argument as above, 
a>2 can contain no more than points. 

Set 3: The set of all falsepoints that are not in the first two sets, and therefore 
have Zh — 1 for some h G {1, . . . , t}, Xj = for some i G {1, . . . , k}, 
and i/j = for all subsets yj covering x$. 

Let a 3 be the largest independent set of / consisting of points in this 
set. Let us fix h — 1. Consider a falsepoint p in this set where x$ = 
for at least one i G {1, . . . , k}. If p contained a yj — 1 such that the 
subset ?/j covers Xj, that point would be a point in the first set. Hence, 
the only points of this form in this set have yj = for all k — 1 subsets 
yj that cover Xj. 

Now consider another falsepoint q in this set, where x a = for at least 
one a G {1, . . . , k}. Once again, the only points in this set must set 
yi, — for all fc — 1 subsets yj, that cover x a . 

Because the set covering problem included a set for each pair of Xj 
points, there exists some yj that covers both Xj and x a . By the previous 
argument, that is set to in all assignments that set Xj or x a = 0. For 
a fixed fc, all of these points can be covered by the implicate — > yj. 
Hence, points p and q are not independent. 

In fact, any two falsepoints chosen that are not in the first set and con- 
tain Zh = 1 for the same h and at least one Xj = are not independent. 
Because there are t values of h, a 3 therefore has size t. 

The largest independent set for all falsepoints cannot exceed the sum of 
the independent sets for these three disjoint sets, hence 

ess(f) < |ai| + \a 2 \ + \a 3 \ < 2 Q) + + L 

The gap between cnf _size(f) and ess(f) = 

cnf_szze(f) > 3{ k 2 )+t(k/2) 
ess(f) ~ 3g)+* 
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Let us set t = 3 (*) . The difference is now: 

cnf_size{f) t{\ + fc/2) 

- 2i - U( j - 

We have element variables, Cl) set variables, and 3(2) amplification vari- 
ables, yielding n = Q(k 2 ) variables in total. The difference between cnf _size(f) 
and ess(f) is therefore > Q(y/n). □ 

We earlier posited that if Tii 7^ co-NP, there exists an infinite set of 

functions for which c "^~^rj^ > cnf _size(f)' 1 for some 7 > 0. We can now 
prove a stronger theorem: 

Theorem 6. There exists an infinite set of Horn functions f for which 
> cnf_size(fy. 

Proof. See construction above. Because cnf _size{f) = 6(/c 3 ), cnf e '^^ = 
6(cn/_sue(/) 1 / 3 ). □ 
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